Goto

Collaborating Authors

 structured prediction theory


On Structured Prediction Theory with Calibrated Convex Surrogate Losses

Neural Information Processing Systems

We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called calibration function relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for structured prediction.


Structured Prediction Theory Based on Factor Graph Complexity

Neural Information Processing Systems

We present a general theoretical analysis of structured prediction with a series of new results. We give new data-dependent margin guarantees for structured prediction for a very wide family of loss functions and a general family of hypotheses, with an arbitrary factor graph decomposition. These are the tightest margin bounds known for both standard multi-class and general structured prediction problems. Our guarantees are expressed in terms of a data-dependent complexity measure, \emph{factor graph complexity}, which we show can be estimated from data and bounded in terms of familiar quantities for several commonly used hypothesis sets, and a sparsity measure for features and graphs. Our proof techniques include generalizations of Talagrand's contraction lemma that can be of independent interest. We further extend our theory by leveraging the principle of Voted Risk Minimization (VRM) and show that learning is possible even with complex factor graphs. We present new learning bounds for this advanced setting, which we use to devise two new algorithms, \emph{Voted Conditional Random Field} (VCRF) and \emph{Voted Structured Boosting} (StructBoost). These algorithms can make use of complex features and factor graphs and yet benefit from favorable learning guarantees. We also report the results of experiments with VCRF on several datasets to validate our theory.


Reviews: Structured Prediction Theory Based on Factor Graph Complexity

Neural Information Processing Systems

The paper is well written and motivated. In particular the problem considered is relevant. On the downside there are some issues related to the interpretability of the presented results: - In Theorem 1 the generalization error is bounded in terms of the additive or multiplicative empirical margin losses. However their formulation at Eq. (5) and (6) is hard to interpret and would benefit from a comment. This is problematic since it is not clear how these quantities are related to the algorithmic approaches discussed in Sec. 5.


Reviews: On Structured Prediction Theory with Calibrated Convex Surrogate Losses

Neural Information Processing Systems

The paper examines consistency of surrogate losses for multiclass prediction. The authors present their results using the formalism of structured prediction. Alas, there is no direct connection or exploitation of the separability of structured prediction losses. The paper is overly notated and variables are frequently overloaded. I got the feeling that the authors are trying to look mathematically fancy at the expense of readability.


Structured Prediction Theory Based on Factor Graph Complexity

Cortes, Corinna, Kuznetsov, Vitaly, Mohri, Mehryar, Yang, Scott

Neural Information Processing Systems

We present a general theoretical analysis of structured prediction with a series of new results. We give new data-dependent margin guarantees for structured prediction for a very wide family of loss functions and a general family of hypotheses, with an arbitrary factor graph decomposition. These are the tightest margin bounds known for both standard multi-class and general structured prediction problems. Our guarantees are expressed in terms of a data-dependent complexity measure, \emph{factor graph complexity}, which we show can be estimated from data and bounded in terms of familiar quantities for several commonly used hypothesis sets, and a sparsity measure for features and graphs. Our proof techniques include generalizations of Talagrand's contraction lemma that can be of independent interest.


On Structured Prediction Theory with Calibrated Convex Surrogate Losses

Osokin, Anton, Bach, Francis, Lacoste-Julien, Simon

Neural Information Processing Systems

We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for structured prediction. Papers published at the Neural Information Processing Systems Conference.